Skip to content

[GLUTEN-10215][VL] Delta: Native write support for Delta 3.3.1 / Spark 3.5#10801

Merged
dcoliversun merged 10 commits intoapache:mainfrom
zhztheplayer:wip-delta-write-2
Oct 15, 2025
Merged

[GLUTEN-10215][VL] Delta: Native write support for Delta 3.3.1 / Spark 3.5#10801
dcoliversun merged 10 commits intoapache:mainfrom
zhztheplayer:wip-delta-write-2

Conversation

@zhztheplayer
Copy link
Member

@zhztheplayer zhztheplayer commented Sep 25, 2025

Description

The code is out of PoC from #10216.

The PR adds native Delta write support by offloading native Parquet writer to Velox.

The PR only adds support for Spark 3.5 / Delta 3.3.

A TODO item list is at #10215.

PR relies on #10796 and #10802.

Usage

Set spark.gluten.sql.columnar.backend.velox.delta.enableNativeWrite=true to enable the feature. The option is turned off by default.

Test

Unit tests added in #10802 are used to test this PR.

UI

With the patch, the offloaded write operations (typically v1 / v2 write commands) will have a Gluten Delta prefix on the left hand side of its node name, both in explain and UI.

Vanilla Delta commands

Screenshot 2025-10-13 at 18 12 36

Gluten Delta commands

Screenshot 2025-10-13 at 18 08 43

Performance

Slower than vanilla Delta writer as of now.

Performance of the implementation is still to be optimized as we haven't offloaded the stats tracker to Velox native. This is a significant performance overhead.

@github-actions github-actions bot added the VELOX label Sep 25, 2025
@zhztheplayer zhztheplayer force-pushed the wip-delta-write-2 branch 3 times, most recently from 5836a54 to c52056e Compare October 2, 2025 10:43
@github-actions github-actions bot added CORE works for Gluten Core DOCS labels Oct 2, 2025
@github-actions
Copy link

github-actions bot commented Oct 2, 2025

Run Gluten Clickhouse CI on x86

1 similar comment
@github-actions
Copy link

github-actions bot commented Oct 2, 2025

Run Gluten Clickhouse CI on x86

@zhztheplayer zhztheplayer changed the title [VL] Delta: Native write support for Delta 3.3.1 / Spark 3.5 [GLUTEN-10215][VL] Delta: Native write support for Delta 3.3.1 / Spark 3.5 Oct 2, 2025
@github-actions
Copy link

github-actions bot commented Oct 2, 2025

#10215

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

1 similar comment
@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

3 similar comments
@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@zhztheplayer
Copy link
Member Author

Copy link
Contributor

@dcoliversun dcoliversun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Good job!

@dcoliversun dcoliversun merged commit a04dec6 into apache:main Oct 15, 2025
57 checks passed
@zhztheplayer
Copy link
Member Author

zhztheplayer commented Oct 15, 2025

Thanks for reviewing @dcoliversun @FelixYBW .

The performance of this PR is still slow as stats visitor is not offloaded to Velox. But the base write functionality is covered.

I'll look into offloading the stats visitor going forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants